Auto-tuned nested parallelism: A way to reduce the execution time of scientific software in NUMA systems

نویسندگان

  • Jesús Cámara
  • Javier Cuenca
  • Luis-Pedro García
  • Domingo Giménez
چکیده

Scientific and engineering problems are solved with large parallel systems In some cases those systems are NUMA A large number of cores Share a hierarchically organized memory Kernel of the computation for those problems: BLAS o similar Efficient use of kernels a faster solution of a large range of scientific problems Auto Auto-tuned nested parallelism: a way to reduce the execution time of scientific software in NUMA systems tuned nested parallelism: a way to reduce the execution time of scientific software in NUMA systems PPAM 2012. Efficient use of kernels a faster solution of a large range of scientific problems Normally: multithreaded BLAS library optimized for the system is used, but: If the number of cores increases the degradation in the performance grows In this work: Analysis of the behaviour in NUMA of an example of high-level routine: a LU factorisation An improved scheme: [ multithreaded dgemm of BLAS + OpenMP ] nested parallelism An auto-tuning method a reduction in the execution time Outline Outline Introduction Computational systems The software Motivation Auto Auto-tuned nested parallelism: a way to reduce the execution time of scientific software in NUMA systems tuned nested parallelism: a way to reduce the execution time of scientific software in NUMA systems PPAM 2012.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multigrain Parallelization for Model-Based Design Applications Using the OSCAR Compiler

Model-based design is a very popular software development method for developing a wide variety of embedded applications such as automotive systems, aircraft systems, and medical systems. Model-based design tools like MATLAB/Simulink typically allow engineers to graphically build models consisting of connected blocks for the purpose of reducing development time. These tools also support automati...

متن کامل

Tiling and Scheduling of Three-level Perfectly Nested Loops with Dependencies on Heterogeneous Systems

Nested loops are one of the most time-consuming parts and the largest sources of parallelism in many scientific applications. In this paper, we address the problem of 3-dimensional tiling and scheduling of three-level perfectly nested loops with dependencies on heterogeneous systems. To exploit the parallelism, we tile and schedule nested loops with dependencies by awareness of computational po...

متن کامل

Parallélisme des nids de boucles pour l'optimisation du temps d'exécution et de la taille du code. (Nested loop parallelism to optimize execution time and code size)

The real time implementation algorithms always include nested loops which require important execution times. Thus, several nested loop parallelism techniques have been proposed with the aim of decreasing their execution times. These techniques can be classified in terms of granularity, which are the iteration level parallelism and the instruction level parallelism. In the case of the instructio...

متن کامل

A Tool Environment for Efficient Execution of Shared Memory Programs on NUMA Systems

One of the most important performance issues on NUMA systems is data locality since remote memory accesses have latencies several magnitudes higher than local memory accesses. This paper presents a tool environment targeting at tuning NUMA-based shared memory applications towards better memory locality. This tool environment comprises tools, supporting system facilities, and their interface. To...

متن کامل

A Clustering Approach to Scientific Workflow Scheduling on the Cloud with Deadline and Cost Constraints

One of the main features of High Throughput Computing systems is the availability of high power processing resources. Cloud Computing systems can offer these features through concepts like Pay-Per-Use and Quality of Service (QoS) over the Internet. Many applications in Cloud computing are represented by workflows. Quality of Service is one of the most important challenges in the context of sche...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Parallel Computing

دوره 40  شماره 

صفحات  -

تاریخ انتشار 2014